Combined hash and fill AES loop#166
Conversation
Adds more parallelizm into AES loop so modern CPUs can take advantage of it. Also, scratchpad data moves between L1 and L3 caches only one time which saves time and energy per hash.
| } | ||
|
|
||
| void randomx_calculate_hash_first(randomx_vm* machine, uint64_t (&tempHash)[8], const void* input, size_t inputSize) { | ||
| void randomx_calculate_hash_first(randomx_vm* machine, uint64_t *tempHash, const void* input, size_t inputSize) { |
There was a problem hiding this comment.
For typechecking purposes you could still typedef this param to uint64_t foo[8] and use the type in the parameter lists.
There was a problem hiding this comment.
Yeah, uint64_t tempHash[8] could be more readable in the API and it decays into a pointer anyways.
There was a problem hiding this comment.
And that would have prevented that sizeof() bug too.
There was a problem hiding this comment.
Using the "first" and "next" hash function in randomx.cpp, the miner (mine which based on this repo, not xmirg) receives "Low difficulty share" from pool.
There was a problem hiding this comment.
You have to be careful using these new functions, check how benchmark.cpp handles them and where it updates the nonce.
There was a problem hiding this comment.
My miner works fine on this workflow
uint64_t tempHash[8];
while (nonce < noncesCount) {
nonce = atomicNonce.fetch_add(1);
store32(noncePtr, nonce);
randomx_calculate_hash_first(vm, tempHash, blockTemplate, sizeof(blockTemplate));
randomx_calculate_hash_next(vm, tempHash, blockTemplate, sizeof(blockTemplate), &hash);
result.xorWith(hash);
}
But when the randomx_calculate_hash_first func is out of while loop, the miner gets the wrong result.
uint64_t tempHash[8];
store32(noncePtr, nonce);
randomx_calculate_hash_first(vm, tempHash, blockTemplate, sizeof(blockTemplate));
while (nonce < noncesCount) {
nonce = atomicNonce.fetch_add(1);
store32(noncePtr, nonce);
randomx_calculate_hash_next(vm, tempHash, blockTemplate, sizeof(blockTemplate), &hash);
result.xorWith(hash);
}
Adds more parallelizm into AES loop so modern CPUs can take advantage of it. Also, scratchpad data moves between L1 and L3 caches only one time which saves time and energy per hash.
Adds more parallelizm into AES loop so modern CPUs can take advantage of it. Also, scratchpad data moves between L1 and L3 caches only one time which saves time and energy per hash.